phase transition
Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models
Mun, Kyunghoo, Rosenzweig, Matthew
We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coefficients satisfy a certain decay condition, we prove that the critical coupling strength $K_c$ coincides with the linear stability threshold $K_\#$ of the uniform distribution and that the phase transition is continuous, in the sense that the uniform distribution is the unique global minimizer at criticality. The proof is based on a sharp coercivity estimate for the free energy obtained from the constrained Lebedev--Milin inequality. We apply this result to three motivating models for which the exact value of the phase transition and its (dis)continuity in terms of the model parameters was not fully known. For the two-dimensional Doi--Onsager model $W(θ)=-|\sin(2πθ)|$, we prove that the phase transition is continuous at $K_c=K_\#=3π/4$. For the noisy transformer model $W_β(θ)=(e^{β\cos(2πθ)}-1)/β$, we identify the sharp threshold $β_*$ such that $K_c(β) = K_\#(β)$ and the phase transition is continuous for $β\leq β_*$, while $K_c(β)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Phase transition on a context-sensitive random language model with short range interactions
Toji, Yuma, Takahashi, Jun, Roychowdhury, Vwani, Miyahara, Hideyuki
Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- (3 more...)
- North America > United States > California (0.14)
- North America > United States > Massachusetts (0.04)
- North America > Canada (0.04)
- (2 more...)
Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics
We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task ($p = 19$--$113$), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime, the reproducible enhancement of energy variance suggests a critical-like crossover accompanying representational reorganization. Our framework provides a unified statistical-mechanical perspective on attention scaling, training dynamics, and positional encoding, interpreting the phenomena as emergent properties of an effective thermodynamic system rather than isolated heuristics. Although the present results indicate finite-size crossover behavior rather than a strict phase transition, they motivate further investigation into scaling limits of deep architectures through fluctuation-based observables.
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Information Technology > Artificial Intelligence > Vision (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Republic of Türkiye > Bingoel Province > Bingol (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- (3 more...)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.32)
The committee machine: Computational to statistical gaps in learning a two-layers neural network
Benjamin Aubin, Antoine Maillard, jean barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová
Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)